{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c0d8b66c",
   "metadata": {},
   "source": [
    "# WandbCallbackHandler Demo\n",
    "\n",
    "[Weights & Biases Prompts](https://docs.wandb.ai/guides/prompts) is a suite of LLMOps tools built for the development of LLM-powered applications.\n",
    "\n",
    "The `WandbCallbackHandler` is integrated with W&B Prompts to visualize and inspect the execution flow of your index construction, or querying over your index and more. You can use this handler to persist your created indices as W&B Artifacts allowing you to version control your indices.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "612f35ad",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Paste your OpenAI key from: https://platform.openai.com/account/api-keys\n",
      "········\n",
      "OpenAI API key configured\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "from getpass import getpass\n",
    "\n",
    "if os.getenv(\"OPENAI_API_KEY\") is None:\n",
    "    os.environ[\"OPENAI_API_KEY\"] = getpass(\n",
    "        \"Paste your OpenAI key from: https://platform.openai.com/account/api-keys\\n\"\n",
    "    )\n",
    "assert os.getenv(\"OPENAI_API_KEY\", \"\").startswith(\n",
    "    \"sk-\"\n",
    "), \"This doesn't look like a valid OpenAI API key\"\n",
    "print(\"OpenAI API key configured\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "78a29d9a",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.callbacks import CallbackManager, CBEventType\n",
    "from llama_index.callbacks import LlamaDebugHandler, WandbCallbackHandler\n",
    "from llama_index import (\n",
    "    GPTListIndex,\n",
    "    GPTTreeIndex,\n",
    "    GPTVectorStoreIndex,\n",
    "    ServiceContext,\n",
    "    SimpleDirectoryReader,\n",
    "    LLMPredictor,\n",
    "    GPTSimpleKeywordTableIndex,\n",
    "    StorageContext,\n",
    ")\n",
    "from llama_index.indices.composability import ComposableGraph\n",
    "from llama_index import load_index_from_storage, load_graph_from_storage\n",
    "from llama_index.llms import OpenAI"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "e6feb252",
   "metadata": {},
   "source": [
    "# Setup LLM"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "d22fee33",
   "metadata": {},
   "outputs": [],
   "source": [
    "llm = OpenAI(model=\"gpt-4\", temperature=0)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "8790f4c7",
   "metadata": {},
   "source": [
    "# W&B Callback Manager Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "13f2b759",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[34m\u001b[1mwandb\u001b[0m: Streaming LlamaIndex events to W&B at https://wandb.ai/ayut/llamaindex/runs/js7j48l9\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: `WandbCallbackHandler` is currently in beta.\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: Please report any issues to https://github.com/wandb/wandb/issues with the tag `llamaindex`.\n"
     ]
    }
   ],
   "source": [
    "llama_debug = LlamaDebugHandler(print_trace_on_end=True)\n",
    "\n",
    "# wandb.init args\n",
    "run_args = dict(\n",
    "    project=\"llamaindex\",\n",
    ")\n",
    "\n",
    "wandb_callback = WandbCallbackHandler(run_args=run_args)\n",
    "\n",
    "callback_manager = CallbackManager([llama_debug, wandb_callback])\n",
    "service_context = ServiceContext.from_defaults(\n",
    "    callback_manager=callback_manager, llm=llm\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c4cf969a",
   "metadata": {},
   "source": [
    "> After running the above cell, you will get the W&B run page URL. Here you will find a trace table with all the events tracked using [Weights and Biases' Prompts](https://docs.wandb.ai/guides/prompts) feature."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "a4a7c101",
   "metadata": {},
   "source": [
    "# 1. Indexing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "d1011596",
   "metadata": {},
   "outputs": [],
   "source": [
    "docs = SimpleDirectoryReader(\"../data/paul_graham/\").load_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "d3d6975c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "**********\n",
      "Trace: index_construction\n",
      "    |_node_parsing ->  0.130602 seconds\n",
      "      |_chunking ->  0.130149 seconds\n",
      "    |_embedding ->  1.556218 seconds\n",
      "    |_embedding ->  1.543419 seconds\n",
      "**********\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[33mWARNING\u001b[0m Serializing object of type str that is 150182 bytes\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[33mWARNING\u001b[0m Serializing object of type str that is 154176 bytes\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[33mWARNING\u001b[0m Serializing object of type str that is 150182 bytes\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[33mWARNING\u001b[0m Serializing object of type str that is 154176 bytes\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[33mWARNING\u001b[0m Serializing object of type str that is 150112 bytes\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: Logged trace tree to W&B.\n"
     ]
    }
   ],
   "source": [
    "index = GPTVectorStoreIndex.from_documents(docs, service_context=service_context)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "0a948efc",
   "metadata": {},
   "source": [
    "## 1.1 Persist Index as W&B Artifacts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "8ad58e67",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/ayushthakur/integrations/llamaindex/llama_index/docs/examples/callbacks/wandb/run-20230607_012558-js7j48l9/files/storage)... Done. 0.0s\n"
     ]
    }
   ],
   "source": [
    "wandb_callback.persist_index(index, index_name=\"simple_vector_store\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "7ed156a6",
   "metadata": {},
   "source": [
    "## 1.2 Download Index from W&B Artifacts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "dc35f448",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[34m\u001b[1mwandb\u001b[0m:   3 of 3 files downloaded.  \n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "**********\n",
      "Trace: index_construction\n",
      "**********\n"
     ]
    }
   ],
   "source": [
    "storage_context = wandb_callback.load_storage_context(\n",
    "    artifact_url=\"ayut/llamaindex/simple_vector_store:v0\"\n",
    ")\n",
    "\n",
    "# Load the index and initialize a query engine\n",
    "index = load_index_from_storage(storage_context, service_context=service_context)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "ae4de4a9",
   "metadata": {},
   "source": [
    "# 2. Query Over Index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "42221465",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "**********\n",
      "Trace: query\n",
      "    |_query ->  6.765974 seconds\n",
      "      |_retrieve ->  0.404373 seconds\n",
      "        |_embedding ->  0.394931 seconds\n",
      "      |_synthesize ->  6.361441 seconds\n",
      "        |_llm ->  6.334911 seconds\n",
      "**********\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[34m\u001b[1mwandb\u001b[0m: Logged trace tree to W&B.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The context information does not provide details about what the author did growing up.\n"
     ]
    }
   ],
   "source": [
    "query_engine = index.as_query_engine()\n",
    "response = query_engine.query(\"What did the author do growing up?\")\n",
    "print(response, sep=\"\\n\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "d7250272",
   "metadata": {},
   "source": [
    "# 3. Build Complex Indices"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "3a5f2671",
   "metadata": {},
   "outputs": [],
   "source": [
    "# fetch \"New York City\" page from Wikipedia\n",
    "from pathlib import Path\n",
    "\n",
    "import requests\n",
    "\n",
    "response = requests.get(\n",
    "    \"https://en.wikipedia.org/w/api.php\",\n",
    "    params={\n",
    "        \"action\": \"query\",\n",
    "        \"format\": \"json\",\n",
    "        \"titles\": \"New York City\",\n",
    "        \"prop\": \"extracts\",\n",
    "        \"explaintext\": True,\n",
    "    },\n",
    ").json()\n",
    "page = next(iter(response[\"query\"][\"pages\"].values()))\n",
    "nyc_text = page[\"extract\"]\n",
    "\n",
    "data_path = Path(\"data\")\n",
    "if not data_path.exists():\n",
    "    Path.mkdir(data_path)\n",
    "\n",
    "with open(\"data/nyc_text.txt\", \"w\") as fp:\n",
    "    fp.write(nyc_text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "cf0c0307",
   "metadata": {},
   "outputs": [],
   "source": [
    "# load NYC dataset\n",
    "nyc_documents = SimpleDirectoryReader(\"data/\").load_data()\n",
    "# load PG's essay\n",
    "essay_documents = SimpleDirectoryReader(\"../paul_graham_essay/data/\").load_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "0c2dbdea",
   "metadata": {},
   "outputs": [],
   "source": [
    "# While building a composable index, to correctly save the index,\n",
    "# the same `storage_context` needs to be passed to every index.\n",
    "storage_context = StorageContext.from_defaults()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "1d795f6f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "**********\n",
      "Trace: index_construction\n",
      "    |_node_parsing ->  0.195381 seconds\n",
      "      |_chunking ->  0.194719 seconds\n",
      "    |_embedding ->  1.223846 seconds\n",
      "    |_embedding ->  1.277199 seconds\n",
      "    |_embedding ->  1.101784 seconds\n",
      "    |_embedding ->  0.580571 seconds\n",
      "**********\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[33mWARNING\u001b[0m Serializing object of type str that is 261728 bytes\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[33mWARNING\u001b[0m Serializing object of type str that is 268750 bytes\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[33mWARNING\u001b[0m Serializing object of type str that is 261728 bytes\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[33mWARNING\u001b[0m Serializing object of type str that is 268750 bytes\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[33mWARNING\u001b[0m Serializing object of type str that is 261658 bytes\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: Logged trace tree to W&B.\n"
     ]
    }
   ],
   "source": [
    "# build NYC index\n",
    "nyc_index = GPTVectorStoreIndex.from_documents(\n",
    "    nyc_documents, service_context=service_context, storage_context=storage_context\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "e9a49c5a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "**********\n",
      "Trace: index_construction\n",
      "    |_node_parsing ->  2.4e-05 seconds\n",
      "**********\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[34m\u001b[1mwandb\u001b[0m: Logged trace tree to W&B.\n"
     ]
    }
   ],
   "source": [
    "# build essay index\n",
    "essay_index = GPTVectorStoreIndex.from_documents(\n",
    "    essay_documents, service_context=service_context, storage_context=storage_context\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "60aa7e5f",
   "metadata": {},
   "source": [
    "## 3.1. Query Over Graph Index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "d2704b34",
   "metadata": {},
   "outputs": [],
   "source": [
    "nyc_index_summary = \"\"\"\n",
    "    New York, often called New York City or NYC, \n",
    "    is the most populous city in the United States. \n",
    "    With a 2020 population of 8,804,190 distributed over 300.46 square miles (778.2 km2), \n",
    "    New York City is also the most densely populated major city in the United States, \n",
    "    and is more than twice as populous as second-place Los Angeles. \n",
    "    New York City lies at the southern tip of New York State, and \n",
    "    constitutes the geographical and demographic center of both the \n",
    "    Northeast megalopolis and the New York metropolitan area, the \n",
    "    largest metropolitan area in the world by urban landmass.[8] With over \n",
    "    20.1 million people in its metropolitan statistical area and 23.5 million \n",
    "    in its combined statistical area as of 2020, New York is one of the world's \n",
    "    most populous megacities, and over 58 million people live within 250 mi (400 km) of \n",
    "    the city. New York City is a global cultural, financial, and media center with \n",
    "    a significant influence on commerce, health care and life sciences, entertainment, \n",
    "    research, technology, education, politics, tourism, dining, art, fashion, and sports. \n",
    "    Home to the headquarters of the United Nations, \n",
    "    New York is an important center for international diplomacy,\n",
    "    an established safe haven for global investors, and is sometimes described as the capital of the world.\n",
    "\"\"\"\n",
    "essay_index_summary = \"\"\"\n",
    "    Author: Paul Graham. \n",
    "    The author grew up painting and writing essays. \n",
    "    He wrote a book on Lisp and did freelance Lisp hacking work to support himself. \n",
    "    He also became the de facto studio assistant for Idelle Weber, an early photorealist painter. \n",
    "    He eventually had the idea to start a company to put art galleries online, but the idea was unsuccessful. \n",
    "    He then had the idea to write software to build online stores, which became the basis for his successful company, Viaweb. \n",
    "    After Viaweb was acquired by Yahoo!, the author returned to painting and started writing essays online. \n",
    "    He wrote a book of essays, Hackers & Painters, and worked on spam filters. \n",
    "    He also bought a building in Cambridge to use as an office. \n",
    "    He then had the idea to start Y Combinator, an investment firm that would \n",
    "    make a larger number of smaller investments and help founders remain as CEO. \n",
    "    He and his partner Jessica Livingston ran Y Combinator and funded a batch of startups twice a year. \n",
    "    He also continued to write essays, cook for groups of friends, and explore the concept of invented vs discovered in software. \n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "353a644b",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "**********\n",
      "Trace: graph_construction\n",
      "**********\n"
     ]
    }
   ],
   "source": [
    "from llama_index import StorageContext, load_graph_from_storage\n",
    "\n",
    "graph = ComposableGraph.from_indices(\n",
    "    GPTSimpleKeywordTableIndex,\n",
    "    [nyc_index, essay_index],\n",
    "    index_summaries=[nyc_index_summary, essay_index_summary],\n",
    "    max_keywords_per_chunk=50,\n",
    "    service_context=service_context,\n",
    "    storage_context=storage_context,\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "cda70171",
   "metadata": {},
   "source": [
    "### 3.1.1 Persist Composable Index as W&B Artifacts "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "4c3ebaf7",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/ayushthakur/integrations/llamaindex/llama_index/docs/examples/callbacks/wandb/run-20230607_012558-js7j48l9/files/storage)... Done. 0.0s\n"
     ]
    }
   ],
   "source": [
    "wandb_callback.persist_index(graph, index_name=\"composable_graph\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "ff60da73",
   "metadata": {},
   "source": [
    "### 3.1.2 Download Index from W&B Artifacts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "8ce7ecb3",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[34m\u001b[1mwandb\u001b[0m:   3 of 3 files downloaded.  \n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "**********\n",
      "Trace: index_construction\n",
      "**********\n",
      "**********\n",
      "Trace: index_construction\n",
      "**********\n",
      "**********\n",
      "Trace: index_construction\n",
      "**********\n"
     ]
    }
   ],
   "source": [
    "storage_context = wandb_callback.load_storage_context(\n",
    "    artifact_url=\"ayut/llamaindex/composable_graph:v0\"\n",
    ")\n",
    "\n",
    "# Load the graph and initialize a query engine\n",
    "graph = load_graph_from_storage(\n",
    "    storage_context, root_id=graph.root_id, service_context=service_context\n",
    ")\n",
    "query_engine = index.as_query_engine()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "b30ddfc9",
   "metadata": {},
   "source": [
    "### 3.1.3 Query"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "e852e00f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "**********\n",
      "Trace: query\n",
      "    |_query ->  5.134109 seconds\n",
      "      |_retrieve ->  0.706585 seconds\n",
      "        |_embedding ->  0.68933 seconds\n",
      "      |_synthesize ->  4.427289 seconds\n",
      "        |_llm ->  4.39144 seconds\n",
      "**********\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[34m\u001b[1mwandb\u001b[0m: Logged trace tree to W&B.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The context information does not provide details about the climate of New York City or how cold it is during the winter.\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(\n",
    "    \"What is the climate of New York City like? How cold is it during the winter?\",\n",
    ")\n",
    "print(response, sep=\"\\n\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c49ff101",
   "metadata": {},
   "source": [
    "### Close W&B Callback Handler\n",
    "\n",
    "When we are done tracking our events we can close the wandb run."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "28ef6a7b",
   "metadata": {},
   "outputs": [],
   "source": [
    "wandb_callback.finish()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
